Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add stream creation time in get stats api #632

Merged

Conversation

gurjotkaur20
Copy link
Contributor

@gurjotkaur20 gurjotkaur20 commented Jan 22, 2024

Fixes #587

Description

Added stream creation time in the get stats api.

@nitisht
Copy link
Member

nitisht commented Jan 22, 2024

Fixes #574

Is this the correct issue this PR is fixing?

@gurjotkaur20
Copy link
Contributor Author

Fixes #574

Is this the correct issue this PR is fixing?

My bad. I have corrected the issue number. Plus this PR is adding stream creation timestamp. Let me know if this is sufficient or we need first event timestamp as well?

@nitisht
Copy link
Member

nitisht commented Jan 22, 2024

Thanks @gurjotkaur20 .

Let me know if this is sufficient or we need first event timestamp as well?

Quoting @logut from issue #587,

It would me nice to be able to get the timestamp of the first event of the logstream in the stats in the api and console.

So, let's add the first event timestamp in the API as well.

@nitisht nitisht mentioned this pull request Jan 22, 2024
4 tasks
@nitisht
Copy link
Member

nitisht commented Jan 22, 2024

We can add an additional field in the stream.json file, first-event-at. This would be empty at the stream creation time, but will be updated when the first event is received on the log stream.

Then return this in the get-stats API.

@logut
Copy link

logut commented Jan 22, 2024

Thanks for working on this !
If you store statically the first event in the stream.json file, would it be updated when retention cleanup is applied ?
I would be interested more in "first event queryable for the stream" rather than "first event ever"

@nitisht
Copy link
Member

nitisht commented Jan 22, 2024

If you store statically the first event in the stream.json file, would it be updated when retention cleanup is applied ?

Great catch, we'll have to keep updating the stream time. In this case, don't you think it is better to use SQL directly @logut ?

@gurjotkaur20 gurjotkaur20 force-pushed the add-stream-creation-time branch from b5d549d to adbae20 Compare January 23, 2024 16:14
@gurjotkaur20
Copy link
Contributor Author

@nitisht I've added first event time. Please take a look. However, I've noticed one point. For example,

  • There is an event and first-event-at is set in the stream.json. It gets cleaned up during retention cleanup. Now, there's no event left in the stream. The stats still shows the event count as well as the old first-event-at time.
    Is this the expected behavior?
    Or
    Should it be handled differently?

@gurjotkaur20 gurjotkaur20 marked this pull request as draft January 24, 2024 02:14
Copy link
Member

@nitisht nitisht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO there is a simpler way, we have something called .stream.json per Stream - this file has a section called snapshot.manifest_list, this points to all manifest.json files (created per day)

To find the first query-able event time stamp, you could see the list snapshot.manifest_list, go to the first entry in this list which would be a manifest.json file, in this manifest.json, you can get the smallest p_timestamp

This approach would need max of two S3 calls and will be much cheaper computationally

@nitisht
Copy link
Member

nitisht commented Feb 6, 2024

@gurjotkaur20 now that #649 is merged, can we make progress here?

@gurjotkaur20
Copy link
Contributor Author

gurjotkaur20 commented Feb 7, 2024

Sure.

Consolidating the changes required:

  • Add stream creation timestamp creation_time in the stats api.
  • Update manifest list in snapshot during retention cleanup. Currently is it not getting updated.
  • Once retention cleanup is done, get the p_timestamp from the first file of manifest list in snapshot and set it in the first_event_at in the stats api.

@gurjotkaur20 gurjotkaur20 force-pushed the add-stream-creation-time branch from 8be084b to adbae20 Compare February 7, 2024 01:54
@gurjotkaur20 gurjotkaur20 marked this pull request as ready for review February 11, 2024 08:53
@nitisht
Copy link
Member

nitisht commented Feb 11, 2024

Thanks @gurjotkaur20 we'll review this shortly

@nikhilsinhaparseable
Copy link
Contributor

@gurjotkaur20 when I ingest one event at a time using postman, the code works as expected
but when I try ingesting using Quest, I see below error -

thread 'thread 'actix-rt|system:0|arbiter:3' panicked at actix-rt|system:0|arbiter:0thread 'server/src/storage/object_storage.rs' panicked at actix-rt|system:0|arbiter:4:216server/src/storage/object_storage.rs::thread '216actix-rt|system:0|arbiter:5:' panicked at 53' panicked at server/src/storage/object_storage.rs:21653::
parseable config is valid json: Error("EOF while parsing a string", line: 1, column: 32):

parseable config is valid json: Error("EOF while parsing a value", line: 1, column: 0)server/src/storage/object_storage.rs:53note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
216
:53:
parseable config is valid json: Error("EOF while parsing a string", line: 1, column: 32)
thread 'actix-rt|system:0|arbiter:1' panicked at server/src/storage/object_storage.rs:216::
parseable config is valid json: Error("EOF while parsing a value", line: 1, column: 0)
53:
parseable config is valid json: Error("EOF while parsing a value", line: 1, column: 0)

Steps -

  1. Run parseable server (i ran with local-store)
  2. Start ingesting with Quest
  3. The ingestion fails after a few seconds
  4. I see error in Parseable server console - thread 'actix-rt|system:0|arbiter:5' panicked at server/src/storage/object_storage.rs:216:53:
    parseable config is valid json: Error("EOF while parsing a value", line: 1, column: 0)
    note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Please check.
Do let me know if you need help in setting up Quest for ingestion.

@gurjotkaur20
Copy link
Contributor Author

gurjotkaur20 commented Feb 15, 2024

@nikhilsinhacloudsurfex I have tried running quest in container as well as on local but didn't find this issue. Please help in reproducing if you have any particular scenario in mind.

log::error!("Failed to update manifest list in the snapshot {err:?}")
}

if let Ok(Some(first_event_at)) = catalog::get_first_event(store, &stream_name).await {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here to update the first event timestamp after cleanup because after retention cleanup, first event timestamp would be different.

@gurjotkaur20 gurjotkaur20 requested a review from nitisht February 24, 2024 05:30
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved, good to merge

@nikhilsinhaparseable nikhilsinhaparseable merged commit e7cd586 into parseablehq:main Feb 26, 2024
6 checks passed
nikhilsinhaparseable pushed a commit to nikhilsinhaparseable/parseable that referenced this pull request Apr 20, 2024
Changes does in the PR -
1. adds the first_event_at property (from the min value of p_timestamp of the first parquet file listed in the first manifest file from the snapshot of the stream.json) to the stats api and writes it to the stream.json file at the request of get stats.
2. updates the first_event_at in case of retention

Fixes : parseablehq#587
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add first event date to logstream stats
4 participants